AITopics

Country:

Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Neural Information Processing SystemsFeb-8-2026, 22:46:44 GMT

Entropic Desired Dynamics for Intrinsic Control: Supplemental Material Steven Hansen

While this is not close to the state-of-the-art in general (c.f. Figure 2 shows the effect of action entropy on exploratory behavior in Montezuma's Revenge. Number of unique avatar positions visited. Full training curves across all 6 Atari games are shown in Figure 1, including the random policy baseline. To ensure this didn't hamper performance, we At each state visited by the agent evaluator during training, the agent's state (consisting of the avatar's The full curves are included for completeness. The compute cluster we performed experiments on is heterogenous, and has features such as host-sharing, adaptive load-balancing, etc.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)

Neural Information Processing SystemsOct-11-2025, 00:16:23 GMT

2bde8fef08f7ebe42b584266cbcfc909-Paper-Conference.pdf

neuromop network, terminal state, variability, (17 more...)

Country: Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Neural Information Processing SystemsOct-9-2025, 03:25:42 GMT

Evaluating the Moral Beliefs Encoded in LLMs

large language model, machine learning, natural language, (18 more...)

Country:

Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(4 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Neural Information Processing SystemsAug-14-2025, 18:42:44 GMT

5f7f02b7e4ade23430f345f954c938c1-Supplemental.pdf

action entropy, arxiv preprint arxiv, eddict, (14 more...)

Country: Oceania > Australia > New South Wales > Sydney (0.04)

Industry: Leisure & Entertainment > Games (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

arXiv.org Artificial IntelligenceJun-11-2025

DemoSpeedup: Accelerating Visuomotor Policies via Entropy-Guided Demonstration Acceleration

Guo, Lingxiao, Xue, Zhengrong, Xu, Zijing, Xu, Huazhe

Imitation learning has shown great promise in robotic manipulation, but the policy's execution is often unsatisfactorily slow due to commonly tardy demonstrations collected by human operators. In this work, we present DemoSpeedup, a self-supervised method to accelerate visuomotor policy execution via entropy-guided demonstration acceleration. DemoSpeedup starts from training an arbitrary generative policy (e.g., ACT or Diffusion Policy) on normal-speed demonstrations, which serves as a per-frame action entropy estimator. The key insight is that frames with lower action entropy estimates call for more consistent policy behaviors, which often indicate the demands for higher-precision operations. In contrast, frames with higher entropy estimates correspond to more casual sections, and therefore can be more safely accelerated. Thus, we segment the original demonstrations according to the estimated entropy, and accelerate them by down-sampling at rates that increase with the entropy values. Trained with the speedup demonstrations, the resulting policies execute up to 3 times faster while maintaining the task completion performance. Interestingly, these policies could even achieve higher success rates than those trained with normal-speed demonstrations, due to the benefits of reduced decision-making horizons. Project Page: https://demospeedup.github.io/

artificial intelligence, demonstration, machine learning, (14 more...)

2506.05064

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-26-2023

Evaluating the Moral Beliefs Encoded in LLMs

Scherrer, Nino, Shi, Claudia, Feder, Amir, Blei, David M.

This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not kill"). We administer the survey to 28 open- and closed-source LLMs. We find that (a) in unambiguous scenarios, most models "choose" actions that align with commonsense. In ambiguous cases, most models express uncertainty. (b) Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording. (c) Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.

large language model, machine learning, natural language, (18 more...)

2307.14324

Country:

Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

arXiv.org Artificial IntelligenceSep-5-2021

Soft Hierarchical Graph Recurrent Networks for Many-Agent Partially Observable Environments

Ye, Zhenhui, Jiang, Xiaohong, Song, Guanghua, Yang, Bowei

The recent progress in multi-agent deep reinforcement learning(MADRL) makes it more practical in real-world tasks, but its relatively poor scalability and the partially observable constraints raise challenges to its performance and deployment. Based on our intuitive observation that the human society could be regarded as a large-scale partially observable environment, where each individual has the function of communicating with neighbors and remembering its own experience, we propose a novel network structure called hierarchical graph recurrent network(HGRN) for multi-agent cooperation under partial observability. Specifically, we construct the multi-agent system as a graph, use the hierarchical graph attention network(HGAT) to achieve communication between neighboring agents, and exploit GRU to enable agents to record historical information. To encourage exploration and improve robustness, we design a maximum-entropy learning method to learn stochastic policies of a configurable target action entropy. Based on the above technologies, we proposed a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant named SAC-HRGN. Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four baselines, but also demonstrates the interpretability, scalability, and transferability of the proposed model. Ablation studies prove the function and necessity of each component.

agent, communication, information, (13 more...)

2109.02032

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Baram, Nir, Tennenholtz, Guy, Mannor, Shie

Action Redundancy in Reinforcement Learning

arXiv.org Artificial IntelligenceFeb-22-2021

Maximum Entropy (MaxEnt) reinforcement learning is a powerful learning paradigm which seeks to maximize return under entropy regularization. However, action entropy does not necessarily coincide with state entropy, e.g., when multiple actions produce the same transition. Instead, we propose to maximize the transition entropy, i.e., the entropy of next states. We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy. Particularly, we explore the latter in both deterministic and stochastic settings and develop tractable approximation methods in a near model-free setup. We construct algorithms to minimize action redundancy and demonstrate their effectiveness on a synthetic environment with multiple redundant actions as well as contemporary benchmarks in Atari and Mujoco. Our results suggest that action redundancy is a fundamental problem in reinforcement learning.

action redundancy, entropy, redundancy, (13 more...)

2102.11329

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ortega, Pedro A., Braun, Daniel A.

A conversion between utility and information

arXiv.org Artificial IntelligenceDec-30-2009

Rewards typically express desirabilities or preferences over a set of alternatives. Here we propose that rewards can be defined for any probability distribution based on three desiderata, namely that rewards should be real-valued, additive and order-preserving, where the latter implies that more probable events should also be more desirable. Our main result states that rewards are then uniquely determined by the negative information content. To analyze stochastic processes, we define the utility of a realization as its reward rate. Under this interpretation, we show that the expected utility of a stochastic process is its negative entropy rate. Furthermore, we apply our results to analyze agent-environment interactions. We show that the expected utility that will actually be achieved by the agent is given by the negative cross-entropy from the input-output (I/O) distribution of the coupled interaction system and the agent's I/O distribution. Thus, our results allow for an information-theoretic interpretation of the notion of utility and the characterization of agent-environment interactions in terms of entropy dynamics.

agent, artificial intelligence, game theory, (19 more...)

0911.5106

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.90)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)